Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning

نویسندگان

چکیده

With the recent growth of Internet, volume data has also increased. In particular, increase in amount unstructured makes it difficult to manage data. Classification is needed order be able use for various purposes. Since manually classify ever-increasing purpose types analysis and evaluation, automatic classification methods are needed. addition, performance imbalanced multi-class a challenging task. As number classes increases, so does decision boundaries learning algorithm solve. Therefore, this paper, an improvement model proposed using WordNet lexical ontology BERT perform deeper on features text, thereby improving effect model. It was observed that success increased when 11 general lexicographer files based synthesis sets, syntactic categories, logical groupings. used feature dimension reduction. experimental studies, word embedding were without Afterwards, Random Forest (RF), Support Vector Machine (SVM) Multi-Layer Perceptron (MLP) algorithms employed classification. These studies then repeated with reduction performed by WordNet. addition machine model, experiments conducted pretrained The results showed that, unstructured, seven-class, dataset, highest accuracy value 93.77% obtained our

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ontology-Based MEDLINE Document Classification

An increasing and overwhelming amount of biomedical information is available in the research literature mainly in the form of free-text. Biologists need tools that automate their information search and deal with the high volume and ambiguity of free-text. Ontologies can help automatic information processing by providing standard concepts and information about the relationships between concepts....

متن کامل

Content Based Document Recommender using Deep Learning

With the recent advancements in information technology there has been a huge surge in amount of data available. But information retrieval technology has not been able to keep up with this pace of information generation resulting in over spending of time for retrieving relevant information. Even though systems exist for assisting users to search a database along with filtering and recommending r...

متن کامل

Multi-Class Document Layout Classification using Random Chopping

This paper proposes a multi-class document layout classification/recognition system using a method called random chopping. A scanned document image undergoes text line extraction and is represented as a set of quadrilaterals for every pair of text lines. For compact representation, a dictionary of quadrilateral clusters is built beforehand, and a document image is then represented as a word occ...

متن کامل

Twitter Demographic Classification Using Deep Multi-modal Multi-task Learning

Twitter should be an ideal place to get a fresh read on how different issues are playing with the public, one that’s potentially more reflective of democracy in this new media age than traditional polls. Pollsters typically ask people a fixed set of questions, while in social media people use their own voices to speak about whatever is on their minds. However, the demographic distribution of us...

متن کامل

Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification

Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13106139